CORONAVIRUS FUTURE PREDICTION

Coronavirus disease 2019 (COVID-19) is a contagious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first known case was identified in Wuhan, China, in December 2019. The disease has since spread worldwide, leading to an ongoing pandemic.

This project takes a detailed look on the global, continental and individual country level data of COVID-19. Some of the various parameters are `Cases`,`Deaths`,`Recoveries`,`Tests`,`Vaccinations` and many more. The project is divided into 4 distinct sections-

1. Preprocessing, as the name suggests is where the preprocessing takes place.
2. EDA consists of visualisations and their inferences on global data, continental comparison, comparison between countries, a bi-variate analysis of several features of different countries and then checking how well the data for countries fit Benford's distribution.
3. ML consists of three algorithms, SARIMA, Holt Winter's Triple Exponential Smoothing and Facebook's Prophet. And then an ensemble of all three.
4. Lastly Citations consists of, you guessed it, citations.

Sources of Data

Main Coronavirus Global Data from OWID [1] [2]
Recovery Data from JHU [3]
Government Response Tracker Data from Oxford [4]
Google Mobility Trends Data [5]
Literacy Rate [6]
Democracy Index [7]


Jump to-

0. Preprocessing

1. EDA

2. Machine Learning

3. Citations

(All the sections linked to above use the imports below, so execute them first.)

If some of the code breaks, it could be due to the datasets having major alterations (apart from the daily updating of figures). In that case, make the variable `STATIC` `True` which imports local files that are in the same directory as this project. The static data is dated `30 August 2021 16:30 IST`

Main Data [[1]](#cite_1)[[2]](#cite_2) </span>

Recovery Data [[3]](#cite_3)

NOTE: Tracking of recovery data was discontinued from 5th August 2021 onwards.

[[4]](#cite_4)

Government Response Data [[5]](#cite_5)